Jamba is a state-of-the-art hybrid SSM-Transformer architecture large language model, combining the advantages of attention mechanisms and the Mamba architecture, supporting a 256K context length, and suitable for inference on a single 80GB GPU.
Large Language Model
Transformers